AITopics | bias analysis

Collaborating Authors

bias analysis

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unveiling Modality Bias: Automated Sample-Specific Analysis for Multimodal Misinformation Benchmarks

Lin, Hehai, Liu, Hui, Cao, Shilei, Li, Jing, Li, Haoliang, Wang, Wenya

arXiv.org Artificial IntelligenceNov-11-2025

Numerous multimodal misinformation benchmarks exhibit bias toward specific modalities, allowing detectors to make predictions based solely on one modality. While previous research has quantified bias at the dataset level or manually identified spurious correlations between modalities and labels, these approaches lack meaningful insights at the sample level and struggle to scale to the vast amount of online information. In this paper, we investigate the design for automated recognition of modality bias at the sample level. Specifically, we propose three bias quantification methods based on theories/views of different levels of granularity: 1) a coarse-grained evaluation of modality benefit; 2) a medium-grained quantification of information flow; and 3) a fine-grained causality analysis. T o verify the effectiveness, we conduct a human evaluation on two popular benchmarks. Experimental results reveal three interesting findings that provide potential direction toward future research: 1) Ensembling multiple views is crucial for reliable automated analysis; 2) Automated analysis is prone to detector-induced fluctuations; and 3) Different views produce a higher agreement on modality-balanced samples but diverge on biased ones.

data mining, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2511.05883

Country: Asia > China (0.46)

Genre: Research Report (0.81)

Industry: Media > News (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

Bias Analysis and Mitigation through Protected Attribute Detection and Regard Classification

Udagawa, Takuma, Zhao, Yang, Kanayama, Hiroshi, Bhattacharjee, Bishwaranjan

arXiv.org Artificial IntelligenceSep-4-2025

Large language models (LLMs) acquire general linguistic knowledge from massive-scale pretraining. However, pretraining data mainly comprised of web-crawled texts contain undesirable social biases which can be perpetuated or even amplified by LLMs. In this study, we propose an efficient yet effective annotation pipeline to investigate social biases in the pretraining corpora. Our pipeline consists of protected attribute detection to identify diverse demographics, followed by regard classification to analyze the language polarity towards each attribute. Through our experiments, we demonstrate the effect of our bias analysis and mitigation measures, focusing on Common Crawl as the most representative pretraining corpus.

computational linguistic, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2504.14212

Country:

Asia > Middle East (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.49)

Industry:

Health & Medicine (1.00)
Law Enforcement & Public Safety (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Fairness at Every Intersection: Uncovering and Mitigating Intersectional Biases in Multimodal Clinical Predictions

Ramachandranpillai, Resmi, Sampath, Kishore, Mohammad, Ayaazuddin, Alikhani, Malihe

arXiv.org Artificial IntelligenceNov-30-2024

Biases in automated clinical decision-making using Electronic Healthcare Records (EHR) impose significant disparities in patient care and treatment outcomes. Conventional approaches have primarily focused on bias mitigation strategies stemming from single attributes, overlooking intersectional subgroups -- groups formed across various demographic intersections (such as race, gender, ethnicity, etc.). Rendering single-attribute mitigation strategies to intersectional subgroups becomes statistically irrelevant due to the varying distribution and bias patterns across these subgroups. The multimodal nature of EHR -- data from various sources such as combinations of text, time series, tabular, events, and images -- adds another layer of complexity as the influence on minority groups may fluctuate across modalities. In this paper, we take the initial steps to uncover potential intersectional biases in predictions by sourcing extensive multimodal datasets, MIMIC-Eye1 and MIMIC-IV ED, and propose mitigation at the intersectional subgroup level. We perform and benchmark downstream tasks and bias evaluation on the datasets by learning a unified text representation from multimodal sources, harnessing the enormous capabilities of the pre-trained clinical Language Models (LM), MedBERT, Clinical BERT, and Clinical BioBERT. Our findings indicate that the proposed sub-group-specific bias mitigation is robust across different datasets, subgroups, and embeddings, demonstrating effectiveness in addressing intersectional biases in multimodal settings.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.00606

Country:

North America > United States (0.04)
Europe > Portugal > Guarda > Guarda (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Diagnostic Medicine > Imaging (0.47)
Health & Medicine > Health Care Technology > Medical Record (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

Distributed Least Squares in Small Space via Sketching and Bias Reduction

Garg, Sachin, Tan, Kevin, Dereziński, Michał

arXiv.org Artificial IntelligenceMay-8-2024

Matrix sketching is a powerful tool for reducing the size of large data matrices. Yet there are fundamental limitations to this size reduction when we want to recover an accurate estimator for a task such as least square regression. We show that these limitations can be circumvented in the distributed setting by designing sketching methods that minimize the bias of the estimator, rather than its error. In particular, we give a sparse sketching method running in optimal space and current matrix multiplication time, which recovers a nearly-unbiased least squares estimator using two passes over the data. This leads to new communication-efficient distributed averaging algorithms for least squares and related tasks, which directly improve on several prior approaches. Our key novelty is a new bias analysis for sketched least squares, giving a sharp characterization of its dependence on the sketch sparsity. The techniques include new higher-moment restricted Bai-Silverstein inequalities, which are of independent interest to the non-asymptotic analysis of deterministic equivalents for random matrices that arise from sketching.

estimator, inequality, matrix, (13 more...)

arXiv.org Artificial Intelligence

2405.05343

Country:

North America > United States > Michigan (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Massachusetts > Plymouth County > Hanover (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)

Add feedback

Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning

Moutakanni, Théo, Bojanowski, Piotr, Chassagnon, Guillaume, Hudelot, Céline, Joulin, Armand, LeCun, Yann, Muckley, Matthew, Oquab, Maxime, Revel, Marie-Pierre, Vakalopoulou, Maria

arXiv.org Artificial IntelligenceMay-2-2024

AI Foundation models are gaining traction in various applications, including medical fields like radiology. However, medical foundation models are often tested on limited tasks, leaving their generalisability and biases unexplored. We present RayDINO, a large visual encoder trained by self-supervision on 873k chest X-rays. We compare RayDINO to previous state-of-the-art models across nine radiology tasks, from classification and dense segmentation to text generation, and provide an in depth analysis of population, age and sex biases of our model. Our findings suggest that self-supervision allows patient-centric AI proving useful in clinical workflows and interpreting X-rays holistically. With RayDINO and small task-specific adapters, we reach state-of-the-art results and improve generalization to unseen populations while mitigating bias, illustrating the true promise of foundation models: versatility and robustness.

dataset, raydino, x-ray, (16 more...)

arXiv.org Artificial Intelligence

2405.01469

Country:

North America > United States (0.30)
South America > Brazil (0.14)
Asia > Vietnam (0.05)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
(3 more...)

Add feedback

Improving Conversational Recommendation Systems via Bias Analysis and Language-Model-Enhanced Data Augmentation

Wang, Xi, Rahmani, Hossein A., Liu, Jiqun, Yilmaz, Emine

arXiv.org Artificial IntelligenceOct-25-2023

Conversational Recommendation System (CRS) is a rapidly growing research area that has gained significant attention alongside advancements in language modelling techniques. However, the current state of conversational recommendation faces numerous challenges due to its relative novelty and limited existing contributions. In this study, we delve into benchmark datasets for developing CRS models and address potential biases arising from the feedback loop inherent in multi-turn interactions, including selection bias and multiple popularity bias variants. Drawing inspiration from the success of generative data via using language models and data augmentation techniques, we present two novel strategies, 'Once-Aug' and 'PopNudge', to enhance model performance while mitigating biases. Through extensive experiments on ReDial and TG-ReDial benchmark datasets, we show a consistent improvement of CRS techniques with our data augmentation approaches and offer additional insights on addressing multiple newly formulated biases.

analysis and language-model-enhanced data augmentation, bias analysis, conversational recommendation system

arXiv.org Artificial Intelligence

2310.16738

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.60)
Information Technology > Artificial Intelligence > Natural Language (0.60)

Add feedback

Generalizable Natural Language Processing Framework for Migraine Reporting from Social Media

Guo, Yuting, Rajwal, Swati, Lakamana, Sahithi, Chiang, Chia-Chun, Menell, Paul C., Shahid, Adnan H., Chen, Yi-Chieh, Chhabra, Nikita, Chao, Wan-Ju, Chao, Chieh-Ju, Schwedt, Todd J., Banerjee, Imon, Sarker, Abeed

arXiv.org Artificial IntelligenceDec-23-2022

Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text classification system for automatically detecting self-reported migraine-related posts, and (iii) conduct analyses of the self-reported posts to assess the utility of social media for studying this problem. We manually annotated 5750 Twitter posts and 302 Reddit posts. Our system achieved an F1 score of 0.90 on Twitter and 0.93 on Reddit. Analysis of information posted by our 'migraine cohort' revealed the presence of a plethora of relevant information about migraine therapies and patient sentiments associated with them. Our study forms the foundation for conducting an in-depth analysis of migraine-related information using social media data.

artificial intelligence, social media, tweet, (16 more...)

arXiv.org Artificial Intelligence

2212.12454

Country:

North America > United States > Texas (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Minnesota > Olmsted County > Rochester (0.04)
(8 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology > Headaches (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Race Bias Analysis of Bona Fide Errors in face anti-spoofing

Abduh, Latifah, Ivrissimtzis, Ioannis

arXiv.org Artificial IntelligenceOct-11-2022

Face recognition is the method of choice behind some of the most widely deployed biometric authentication systems, currently supporting a range of applications, from passport control at airports, to mobile phone or laptop login. A key weaknesses of the technology, preventing it from being employed in security sensitive applications in uncontrolled environments, as for example ATM machines for money withdrawal, is its vulnerability to presentation attacks, where imposters attempt to gain wrongful access by presenting in front of the system's camera a photo, or a video, or by wearing a mask resembling a registered person. As a solution to this problem, algorithms for presentation attack detection (PAD) are developed, that is, binary classifiers trained to distinguish between the bona fide samples coming from live subjects, and those coming from imposters. The large variety in the types of possible presentation attacks, and the large variation in the environmental conditions under which they might take place, make PAD a particularly challenging problem. However, the current state-of-the-art, utilising the power of deep learning, comprises classifiers with excellent accuracy rates, and a satisfactory generalisation power to at least a limited number of previously unseen attacks. Cross-database generalisation is still problematic, however, it is debatable if this is a real obstacle to the deployment of PAD algorithms in practical applications, since such algorithms as usually embedded in specific face recognition systems, with given camera specifications and configurations. Here, we deal with the problem of race bias in face anti-spoofing algorithms. It is a topic that has attracted considerably less research interest than accuracy and generalisation power, despite the fact that it raises ethical, legal, and regulatory considerations, which, by their own, can prevent adoption in specific applications. Addressing this gap, the aim of this paper is to provide a framework for studying the question: Does the classifier work equally well on people from all races?.

artificial intelligence, classifier, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2210.05366

Country:

Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
Asia > East Asia (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

AWS announces SageMaker Clarify to help reduce bias in machine learning models – TechCrunch

#artificialintelligenceDec-8-2020, 16:45:30 GMT

As companies rely increasingly on machine learning models to run their businesses, it's imperative to include anti-bias measures to ensure these models are not making false or misleading assumptions. Today at AWS re:Invent, AWS introduced Amazon SageMaker Clarify to help reduce bias in machine learning models. "We are launching Amazon SageMaker Clarify. And what that does is it allows you to have insight into your data and models throughout your machine learning lifecycle," Bratin Saha, Amazon VP and general manager of machine learning told TechCrunch. He says that it is designed to analyze the data for bias before you start data prep, so you can find these kinds of problems before you even start building your model.

clarify, help reduce bias, sagemaker clarify, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets

Yang, Jianing, Zhu, Yuying, Wang, Yongxin, Yi, Ruitao, Zadeh, Amir, Morency, Louis-Philippe

arXiv.org Machine LearningJul-7-2020

Question answering biases in video QA datasets can mislead multimodal model to overfit to QA artifacts and jeopardize the model's ability to generalize. Understanding how strong these QA biases are and where they come from helps the community measure progress more accurately and provide researchers insights to debug their models. In this paper, we analyze QA biases in popular video question answering datasets and discover pretrained language models can answer 37-48% questions correctly without using any multimodal context information, far exceeding the 20% random guess baseline for 5-choose-1 multiple-choice questions. Our ablation study shows biases can come from annotators and type of questions. Specifically, annotators that have been seen during training are better predicted by the model and reasoning, abstract questions incur more biases than factual, direct questions. We also show empirically that using annotator-non-overlapping train-test splits can reduce QA biases for video QA datasets.

annotator, machine learning, question answering, (19 more...)

arXiv.org Machine Learning

2007.03626

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.94)

Add feedback